Extending Rule-based Classifiers to Improve Recognition of Imbalanced Classes

نویسندگان

  • Jerzy Stefanowski
  • Szymon Wilk
چکیده

This papers deals with inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). We discuss reasons for bias of standard classifiers toward recognition of examples from majority classes and misclassifcation of the minority class. To avoid limitations of sequential covering approaches, we present a new approach to improve sensitivity of the rule based classifier. It includes a modification the structure of sets of rules, where for majority classes minimal sets of rules are still induced while the rule set for the minority class is generated by the algorithm, called EXPLORE. This algorithm produces rules being more general and supported by more learning examples than rules from the minimal set. The usefulness of the new approach is verified in a comparative experiments on several imbalanced data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه‌روش جدید مبتنی‌بر برنامه‌نویسی ژنتیک برای وزن‌دهی قوانین فازی در طبقه‌بندی نامتوازن

In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...

متن کامل

Extending rule based classifiers for dealing with imbalanced data

Many real world applications involve learning from imbalanced data sets, i.e. data where the minority class of primary importance is under-represented in comparison to majority classes. The high imbalance is an important obstacle for many traditional machine learning algorithms as they are biased towards majority classes. It is desired to improve prediction of interesting, minority class exampl...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Improving Rule-Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data

In the paper we discuss inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). To improve the ability of a classifier to recognize this class, we propose a new selective pre-processing approach that is applied to data before inducing a rule-based classifier. The approach combines se...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008